A Boyer-Moore-style algorithm for regular expression pattern matching
نویسندگان
چکیده
Richard E. Watson Dept. of Mathematics Simon Fraser University Burnaby B.C., Canada watsona@sfu. ca This paper presents a Boyer-Moore type algorithm for regular expression pattern matching, answering an open problem posed by A. V. Aho in 1980 [Aho80, p. 3421. The new algorithm handles patterns specified by regular expressions a generalization of the Boyer-Moore and Commentz-Walter algorithms (which deal with patterns that are single keywords and /inite sets of keywords, respectively). Like the Boyer-Moore and Com mentz-Walter algorithms, the new algorithm makes use of shift functions which can be precomputed and tabulated. The precomputation algorithms are derived, and it is shown that the required shift functions can be precomputed from Commentz-Walter's shift functions known as d, and d2 . In certain cases, the Boyer-Moore (Commentz-Walter) algorithm has greatly outperformed the Knuth-Morris-Pratt (Aho-Corasick) algorithm. In testing, the algorithm presented in this paper also frequently outperforms the regular expression generalization of the Aho-Corasick algorithm.
منابع مشابه
A Boyer-Moore (or Watson-Watson) Type Algorithm for Regular Tree Pattern Matching
In this paper, I outline a new algorithm for regular tree pattern matching. The Boyer-Moore family of string pattern matching algorithms are considered to be among the most e cient. The Boyer-Moore idea of a shift distance was generalized by Commentz-Walter for multiple keywords, and generalizations for regular expressions have also been found. The existence of a further generalization to tree ...
متن کاملA Boyer-Moore Type Algorithm for Timed Pattern Matching
The timed pattern matching problem is formulated by Ulus et al. and has been actively studied since, with its evident application in monitoring realtime systems. The problem takes as input a timed word/signal and a timed pattern (specified either by a timed regular expression or by a timed automaton); and it returns the set of those intervals for which the given timed word, when restricted to t...
متن کاملA Collection of New Regular Grammar Pattern Matching Algorithms
A number of new algorithms for regular grammar pattern matching is presented. The new algorithms handle patterns speci ed by regular grammars | a generalization of multiple keyword pattern matching and single keyword pattern matching, both considered extensively in and [14, Chapter 4] and in [18]. Among the algorithms is a Boyer-Moore type algorithm for regular grammar pattern matching, answeri...
متن کاملEnhanced Pattern Matching Performance Using Improved Boyer Moore Horspool Algorithm
In computer science, the Boyer–Moore–Horspool algorithm is an algorithm for finding substrings in strings. A pattern matching problem can be classified into software and hardware based on implemental methods. It is important of enhance pattern matching performance. This paper proposes enhanced pattern matching performance using improved Boyer Moore Horspool Algorithm. It combines the determinis...
متن کاملAccelerating Boyer Moore Searches on Binary Texts
The Boyer and Moore (BM) pattern matching algorithm is considered as one of the best, but its performance is reduced on binary data. Yet, searching in binary texts has important applications, such as compressed matching. The paper shows how, by means of some pre-computed tables, one may implement the BM algorithm also for the binary case without referring to bits, and processing only entire blo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Sci. Comput. Program.
دوره 48 شماره
صفحات -
تاریخ انتشار 2003